342 research outputs found

    Approximating multivariate posterior distribution functions from Monte Carlo samples for sequential Bayesian inference

    Full text link
    An important feature of Bayesian statistics is the opportunity to do sequential inference: the posterior distribution obtained after seeing a dataset can be used as prior for a second inference. However, when Monte Carlo sampling methods are used for inference, we only have a set of samples from the posterior distribution. To do sequential inference, we then either have to evaluate the second posterior at only these locations and reweight the samples accordingly, or we can estimate a functional description of the posterior probability distribution from the samples and use that as prior for the second inference. Here, we investigated to what extent we can obtain an accurate joint posterior from two datasets if the inference is done sequentially rather than jointly, under the condition that each inference step is done using Monte Carlo sampling. To test this, we evaluated the accuracy of kernel density estimates, Gaussian mixtures, vine copulas and Gaussian processes in approximating posterior distributions, and then tested whether these approximations can be used in sequential inference. In low dimensionality, Gaussian processes are more accurate, whereas in higher dimensionality Gaussian mixtures or vine copulas perform better. In our test cases, posterior approximations are preferable over direct sample reweighting, although joint inference is still preferable over sequential inference. Since the performance is case-specific, we provide an R package mvdens with a unified interface for the density approximation methods

    Crustal structure of northern Italy from the ellipticity of Rayleigh waves

    Get PDF
    Northern Italy is a diverse geological region, including the wide and thick Po Plain sedimentary basin, which is bounded by the Alps and the Apennines. The seismically slow shallow structure of the Po Plain is difficult to retrieve with classical seismic measurements such as surface wave dispersion, yet the detailed structure of the region greatly affects seismic wave propagation and hence seismic ground shaking. Here we invert Rayleigh wave ellipticity measurements in the period range 10–60 s for 95 stations in northern Italy using a fully non linear approach to constrain vertical vS,vPvS,vP and density profiles of the crust beneath each station. The ellipticity of Rayleigh wave ground motion is primarily sensitive to shear-wave velocity beneath the recording station, which reduces along-path contamination effects. We use the 3D layering structure in MAMBo, a previous model based on a compilation of geological and geophysical information for the Po Plain and surrounding regions of northern Italy, and employ ellipticity data to constrain vS,vPvS,vP and density within its layers. We show that ellipticity data from ballistic teleseismic wave trains alone constrain the crustal structure well. This leads to MAMBo-E, an updated seismic model of the region’s crust that inherits information available from previous seismic prospection and geological studies, while fitting new seismic data well. MAMBo-E brings new insights into lateral heterogeneity in the region’s subsurface. Compared to MAMBo, it shows overall faster seismic anomalies in the region’s Quaternary, Pliocene and Oligo-Miocene layers and better delineates the seismic structures of the Po Plain at depth. Two low velocity regions are mapped in the Mesozoic layer in the western and eastern parts of the Plain, which seem to correspond to the Monferrato sedimentary basin and to the Ferrara-Romagna thrust system, respectively

    Large variety in a panel of human colon cancer organoids in response to EZH2 inhibition

    Get PDF
    EZH2 inhibitors have gained great interest for their use as anti-cancer therapeutics. However, most research has focused on EZH2 mutant cancers and recently adverse effects of EZH2 inactivation have come to light. To determine whether colorectal cancer cells respond to EZH2 inhibition and to explore which factors influence the degree of response, we treated a panel of 20 organoid lines derived from human colon tumors with different concentrations of the EZH2 inhibitor GSK126. The resulting responses were associated with mutation status, gene expression and responses to other drugs. We found that the response to GSK126 treatment greatly varied between organoid lines. Response associated with the mutation status of ATRX and PAX2, and correlated with BIK expression. It also correlated well with response to Nutlin-3a which inhibits MDM2-p53 interaction thereby activating p53 signaling. Sensitivity to EZH2 ablation depended on the presence of wild type p53, as tumor organoids became resistant when p53 was mutated or knocked down. Our exploratory study provides insight into which genetic factors predict sensitivity to EZH2 inhibition. In addition, we show that the response to EZH2 inhibition requires wild type p53. We conclude that a subset of colorectal cancer patients may benefit from EZH2-targeting therapies

    A comparison of univariate and multivariate gene selection techniques for classification of cancer datasets

    Get PDF
    BACKGROUND: Gene selection is an important step when building predictors of disease state based on gene expression data. Gene selection generally improves performance and identifies a relevant subset of genes. Many univariate and multivariate gene selection approaches have been proposed. Frequently the claim is made that genes are co-regulated (due to pathway dependencies) and that multivariate approaches are therefore per definition more desirable than univariate selection approaches. Based on the published performances of all these approaches a fair comparison of the available results can not be made. This mainly stems from two factors. First, the results are often biased, since the validation set is in one way or another involved in training the predictor, resulting in optimistically biased performance estimates. Second, the published results are often based on a small number of relatively simple datasets. Consequently no generally applicable conclusions can be drawn. RESULTS: In this study we adopted an unbiased protocol to perform a fair comparison of frequently used multivariate and univariate gene selection techniques, in combination with a ränge of classifiers. Our conclusions are based on seven gene expression datasets, across several cancer types. CONCLUSION: Our experiments illustrate that, contrary to several previous studies, in five of the seven datasets univariate selection approaches yield consistently better results than multivariate approaches. The simplest multivariate selection approach, the Top Scoring method, achieves the best results on the remaining two datasets. We conclude that the correlation structures, if present, are difficult to extract due to the small number of samples, and that consequently, overly-complex gene selection algorithms that attempt to extract these structures are prone to overtraining

    Identification of cancer genes using a statistical framework for multiexperiment analysis of nondiscretized array CGH data

    Get PDF
    Tumor formation is in part driven by DNA copy number alterations (CNAs), which can be measured using microarray-based Comparative Genomic Hybridization (aCGH). Multiexperiment analysis of aCGH data from tumors allows discovery of recurrent CNAs that are potentially causal to cancer development. Until now, multiexperiment aCGH data analysis has been dependent on discretization of measurement data to a gain, loss or no-change state. Valuable biological information is lost when a heterogeneous system such as a solid tumor is reduced to these states. We have developed a new approach which inputs nondiscretized aCGH data to identify regions that are significantly aberrant across an entire tumor set. Our method is based on kernel regression and accounts for the strength of a probe's signal, its local genomic environment and the signal distribution across multiple tumors. In an analysis of 89 human breast tumors, our method showed enrichment for known cancer genes in the detected regions and identified aberrations that are strongly associated with breast cancer subtypes and clinical parameters. Furthermore, we identified 18 recurrent aberrant regions in a new dataset of 19 p53-deficient mouse mammary tumors. These regions, combined with gene expression microarray data, point to known cancer genes and novel candidate cancer genes

    Heterofusion:Fusing genomics data of different measurement scales

    Get PDF
    In systems biology, it is becoming increasingly common to measure biochemical entities at different levels of the same biological system. Hence, data fusion problems are abundant in the life sciences. With the availability of a multitude of measuring techniques, one of the central problems is the heterogeneity of the data. In this paper, we discuss a specific form of heterogeneity, namely, that of measurements obtained at different measurement scales, such as binary, ordinal, interval, and ratio‐scaled variables. Three generic fusion approaches are presented of which two are new to the systems biology community. The methods are presented, put in context, and illustrated with a real‐life genomics example

    Inferring single-cell protein levels and cell cycle behavior in heterogeneous cell populations

    Get PDF
    Individual cells in a genetically identical population can show highly variable behavior. Single-cell measurements allow us to study this variability, but the available measurement techniques have limitations: live-cell microscopy is typically restricted to one or a few molecular markers, while techniques that simultaneously measure large numbers of molecular markers are destructive and cannot be used to follow cells over time. To help overcome these limitations, we present here scMeMo (single cell Mechanistic Modeler): a mechanistic modeling framework that can leverage diverse sets of measurements in order to infer unobserved variables in heterogeneous single cells. We used this framework to construct a model describing cell cycle progression in human cells, and show that it can predict the levels of several proteins in individual cells, based on live-cell microscopy measurements of only one marker and information learned from other experiments. The framework incorporates an uncertainty calibration step that makes the posterior distributions robust against partial model misspecification. Our modeling framework can be used to integrate information from separate experiments with diverse readouts, and to infer single cell variables that may be difficult to measure directly
    corecore